73 research outputs found

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    Interactive Robot Learning of Gestures, Language and Affordances

    Full text link
    A growing field in robotics and Artificial Intelligence (AI) research is human-robot collaboration, whose target is to enable effective teamwork between humans and robots. However, in many situations human teams are still superior to human-robot teams, primarily because human teams can easily agree on a common goal with language, and the individual members observe each other effectively, leveraging their shared motor repertoire and sensorimotor resources. This paper shows that for cognitive robots it is possible, and indeed fruitful, to combine knowledge acquired from interacting with elements of the environment (affordance exploration) with the probabilistic observation of another agent's actions. We propose a model that unites (i) learning robot affordances and word descriptions with (ii) statistical recognition of human gestures with vision sensors. We discuss theoretical motivations, possible implementations, and we show initial results which highlight that, after having acquired knowledge of its surrounding environment, a humanoid robot can generalize this knowledge to the case when it observes another agent (human partner) performing the same motor actions previously executed during training.Comment: code available at https://github.com/gsaponaro/glu-gesture

    Cluster Analysis of Differential Spectral Envelopes on Emotional Speech

    Get PDF
    This paper reports on the analysis of the spectral variation of emotional speech. Spectral envelopes of time aligned speech frames are compared between emotionally neutral and active utterances. Statistics are computed over the resulting differential spectral envelopes for each phoneme. Finally, these statistics are classified using agglomerative hierarchical clustering and a measure of dissimilarity between statistical distributions and the resulting clusters are analysed. The results show that there are systematic changes in spectral envelopes when going from neutral to sad or happy speech, and those changes depend on the valence of the emotional content (negative, positive) as well as on the phonetic properties of the sounds such as voicing and place of articulation

    S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

    Full text link
    We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.Comment: 14 pages, 7 figures, 3 tables. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence on 2023-07-1

    Analisi gerarchica degli inviluppi spettrali differenziali di una voce emotiva

    Get PDF
    .In questo articolo viene descritto un nuovo metodo di analisi del timbro vocale tramite lo studio delle variazioni di inviluppo spettrale utilizzato da uno stesso parlatore in situazioni emotiva neutra o espressiva. Il contesto dell\u27analisi riguarda un corpus di un solo parlatore istruito a leggere una serie di frasi utilizzando uno stile di lettura neutro e successivamente utilizzando due modalit? emotive: uno stile allegro e uno stile triste. Gli inviluppi spettrali relativi alle versioni allineate delle realizzazioni vocali neutre e espressive (allegra e triste) sono confrontati utilizzando un metodo differenziale. Le differenze sono state calcolate tra lo stato emotivo e quello neutro, di conseguenza le due categorie messe a confronto sono neutro-allegro e neutro-triste. La statistica degli inviluppi differenziali ? stata calcolata per ogni fono. I dati sono stati esaminati utilizzando un metodo di clustering gerarchico di tipo agglomerativo. I cluster risultanti sono avvalorati con diverse misure di distanza tra le distribuzioni statistiche ed esplorati visivamente per trovare similitudini e differenze tra le due categorie. I risultati mettono in evidenza sistematiche variazioni nel timbro vocale relative ai due insiemi di differenze di inviluppi spettrali. Questi tratti dipendono dalla valenza dell\u27emozione presa in considerazione (positiva, negativa) come dalle propriet? fonetiche del particolare fono come ad esempio sonorit? e luogo di articolazione

    User Evaluation of the SYNFACE Talking Head Telephone

    Get PDF
    Abstract. The talking-head telephone, Synface, is a lip-reading support for people with hearing-impairment. It has been tested by 49 users with varying degrees of hearing-impaired in UK and Sweden in lab and home environments. Synface was found to give support to the users, especially in perceiving numbers and addresses and an enjoyable way to communicate. A majority deemed Synface to be a useful product.

    Bidirectional fluxes of spermine across the mitochondrial membrane.

    Get PDF
    The polyamine spermine is transported into the mitochondrial matrix by an electrophoretic mechanism having as driving force the negative electrical membrane potential (DW). The presence of phosphate increases spermine uptake by reducingDpH and enhancingDW. The transport system is a specific uniporter constituted by a protein channel exhibiting two asymmetric energy barriers with the spermine binding site located in the energy well between the two barriers. Although spermine transport is electrophoretic in origin, its accumulation does not follow the Nernst equation for the presence of an efflux pathway. Spermine efflux may be induced by different agents, such as FCCP, antimycin A and mersalyl, able to completely or partially reduce theDWvalue and, consequently, suppress or weaken the force necessary to maintain spermine in the matrix. However this efflux may also take place in normal conditions when the electrophoretic accumulation of the polycationic polyamine induces a sufficient drop inDWable to trigger the efflux pathway. The release of the polyamine is most probably electroneutral in origin and can take place in exchange with protons or in symport with phosphate anion. The activity of both the uptake and efflux pathways induces a continuous cycling of spermine across the mitochondrial membrane, the rate of which may be prominent in imposing the concentrations of spermine in the inner and outer compartment. Thus, this event has a significant role on mitochondrial permeability transition modulation and consequently on the triggering of intrinsic apoptosis
    • …
    corecore